%\NeedsTeXFormat{LaTeX2e}[1996/06/01]

\documentclass[12pt]{article}
%\usepackage[rightcaption,raggedright]{sidecap}% for side captions
%\usepackage{framed}         % for floatingboxes
%\usepackage{soul}           % for letterspacing in theorem-style headings

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Special packages used AJD
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\usepackage{amsmath}

% for the Harvard author-date referencing system
%\usepackage[agsm]{harvard}

% if you are using either vancouver.bst or IEEEtran.bst and wish to remove
% square braces in the reference list, uncomment the line below
% \removesquarebraces

%\usepackage{rotating}
%\usepackage{floatpag}
%\rotfloatpagestyle{empty}

% \usepackage{amsmath}% if you are using this package,
                      % it must be loaded before amsthm.sty
%\usepackage{amsthm}
% \usepackage{txfonts}% times font (used to produce EngCguide.pdf)
                      % this package must be loaded after amsthm.sty
\usepackage{graphicx}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Special packages used AJD
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\usepackage{listings}

%\lstset{
%stringstyle=\ttfamily, % typewriter type for strings
%showstringspaces=false % no special string spaces
%}
%\lstset{numbers=left, numberstyle=\tiny, stepnumber=1, numbersep=10pt, frame=trbl}
%\lstset{escapeinside={(*@}{@*)}}

%\usepackage{tikz,pgf}
  
%\usetikzlibrary{positioning}


%\def\sindex{}
%\def\aindex{}

  
% BEAST book specific commands
\newcommand{\BEASTVersion}{2.2.x}
\newcommand{\TracerVersion}{1.6}
\newcommand{\FigTreeVersion}{1.4.2}
%\newcommand{\TODO}[1]{{\color{red} #1}}
%\newcommand{\TODO}[1]{}
%\newcommand{\chapter}[1]{}


% see chapter 3 for details
%\theoremstyle{plain}% default
%\newtheorem{theorem}{Theorem}
%\newtheorem{lemma}[theorem]{Lemma}
%\newtheorem*{corollary}{Corollary}

%\theoremstyle{definition}
%\newtheorem{definition}[theorem]{Definition}
%\newtheorem{condition}[theorem]{Condition}

%\theoremstyle{remark}
%\newtheorem{remark}{Remark}
%\newtheorem{notation}[remark]{Notation}
%\newtheorem*{case}{Case}

%\hyphenation{line-break line-breaks docu-ment triangle cambridge amsthdoc
%cambridgemods baseline-skip author authors cambridgestyle en-vir-on-ment polar}

%\setcounter{tocdepth}{2}% the toc normally lists sections;
% for the purposes of this document, this has been extended to subsections

%hyperref should come after index to make pagerefs work in index
\usepackage{hyperref}
\hypersetup{
    colorlinks=true,
    linkcolor=blue,
    filecolor=magenta,      
    urlcolor=cyan,
}

%\usepackage[all]{xy}

\newcommand{\includeimage}[2][]{%
%HEVEA\imgsrc{#2.hevea.png}%
%BEGIN LATEX
\includegraphics[#1]{#2}
%END LATEX
}
\newcommand{\chainLength}{{6,000,000}}
\newcommand{\logEvery}{{1,000}}
\newcommand{\tracesNumber}{{6,000}} %% chainLength/logEvery
\newcommand{\screenEvery}{{10,000}}
\newcommand{\mccTree}{{\texttt{Primates.MCC.tree}}}

\title{Divergence Dating Tutorial with BEAST {\BEASTVersion}}
\author{Alexei Drummond, Andrew Rambaut, Remco Bouckaert, and Walter Xie}

\begin{document}
\maketitle

\section{Introduction}

This tutorial introduces the BEAST software for Bayesian evolutionary analysis through a simple tutorial. The tutorial involves co-estimation of a gene phylogeny and associated divergence times in the presence of calibration information from fossil evidence. 

You will need the following software at your disposal:

\begin{itemize}

\item {\bf BEAST} - this package contains the BEAST program, BEAUti, TreeAnnotator and other utility programs. This tutorial is written for BEAST v{\BEASTVersion}, which has support for multiple partitions. It is available for download from \texttt{http://www.beast2.org/}.
\item {\bf Tracer} - this program is used to explore the output of BEAST (and other Bayesian MCMC programs). It graphically and
quantitively summarizes the distributions of continuous parameters and provides diagnostic information. At the time of
writing, the current version is v{\TracerVersion}. It is available for download from\\*\texttt{http://tree.bio.ed.ac.uk/software/}.
\item {\bf FigTree} - this is an application for displaying and printing molecular phylogenies, in particular those obtained using
BEAST. At the time of writing, the current version is v{\FigTreeVersion}. It is available for download from \texttt{http://tree.bio.ed.ac.uk/software/}.
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%
%%% TUTORIAL - RATES AND DATES
%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%\section{Rates and dates}

This tutorial will guide you through the analysis of an alignment of sequences sampled from twelve primate species (see Figure \ref{fig:primateAlignment}). The goal is to estimate the phylogeny, the rate of evolution on each lineage and the ages of the uncalibrated
ancestral divergences. 

\begin{figure}	
\centering
\includeimage[width=\textwidth]{figures/AlignmentViewer}
\caption{Part of the alignment for primates.}
\label{fig:primateAlignment}
\end{figure}

The first step will be to convert a NEXUS file with a DATA or CHARACTERS block into a BEAST XML input file. This is done using the program BEAUti (which stands for Bayesian Evolutionary Analysis Utility). 
This is a user-friendly program for setting the evolutionary model and options for the MCMC analysis. 
The second step is to actually run BEAST using the input file generated by BEAUTi,  which
contains the data, model and analysis settings. 
The final step is to explore the output of BEAST in order to diagnose problems and to summarize the results.

\section{BEAUti}

The program BEAUti is a user-friendly program for setting the
model parameters for BEAST. Run BEAUti by double clicking on its icon. Once running, \texttt{BEAUti} will look similar irrespective
of which computer system it is running on. For this tutorial, the Mac OS X version is used in the figures but
the Linux and Windows versions will have the same layout and functionality.

%\TODO{Provide instructions for executing BEAUti  in a Linux environment.}

\subsection{Loading the NEXUS file }

To load a NEXUS format alignment, simply select the \texttt{Import Alignment...} option from the \texttt{File} menu, or drag the file into the middle of {\bf Partitions} panel. 

The example file called \href{https://github.com/CompEvol/beast2/blob/master/examples/nexus/primate-mtDNA.nex?raw=true}{primate-mtDNA.nex} is available from the {\tt examples/nexus/} directory for Mac and Linux and  {\tt examples/nexus/}for Windows inside the directory where BEAST was installed. Or click the link to download the data. After the data is opened in your web browser, right click mouse and save it as \texttt{primate-mtDNA.nex}.
This file contains an alignment of sequences of 12 species of primates. 

%It looks like this (the lines have been truncated):
%
%\begin{verbatim}
%#NEXUS
%begin data;
%dimensions ntax=12 nchar=898;
%format datatype=dna interleave=no gap=-;
%matrix
%Tarsius_syrichta  AAGTTTCATTGGAGCCACCACTCTTATAATTGCCCATGGCCTCACC
%Lemur_catta       AAGCTTCATAGGAGCAACCATTCTAATAATCGCACATGGCCTTACA
%Homo_sapiens	     AAGCTTCACCGGCGCAGTCATTCTCATAATCGCCCACGGGCTTACA
%Pan               AAGCTTCACCGGCGCAATTATCCTCATAATCGCCCACGGACTTACA
%Gorilla           AAGCTTCACCGGCGCAGTTGTTCTTATAATTGCCCACGGACTTACA
%Pongo             AAGCTTCACCGGCGCAACCACCCTCATGATTGCCCATGGACTCACA
%Hylobates         AAGCTTTACAGGTGCAACCGTCCTCATAATCGCCCACGGACTAACC
%Macaca_fuscata    AAGCTTTTCCGGCGCAACCATCCTTATGATCGCTCACGGACTCACC
%M_mulatta         AAGCTTTTCTGGCGCAACCATCCTCATGATTGCTCACGGACTCACC
%M_fascicularis    AAGCTTCTCCGGCGCAACCACCCTTATAATCGCCCACGGGCTCACC
%M_sylvanus        AAGCTTCTCCGGTGCAACTATCCTTATAGTTGCCCATGGACTCACC
%Saimiri_sciureus	 AAGCTTCACCGGCGCAATGATCCTAATAATCGCTCACGGGTTTACT
%;
%end;
%
%begin assumptions;
%	charset firsthalf = 1-449;
%	charset secondhalf = 450-898;
%end;
%end;\end{verbatim}
%
%\medskip{}

%\begin{figure}
%
%\includeimage[width=\textwidth]{figures/ImportNexus}
%
%\caption{A screenshot of the import nexus file menu option in BEAUti. \TODO{Need to have the correct menu shortcuts for OS X!}}
%\label{fig:BEAUti_ImportNexus}
%\end{figure}

An \texttt{Add Partition} window (Figure \ref{fig:addPartition}) would pop up if the related package is installed. If you are using ``pure'' BEAST 2, you can go to the next paragraph.    
Otherwise, select \texttt{Add Alignment} and click \textbf{OK} to continue.

\begin{figure}
\centering	
\includeimage[width=0.5\textwidth]{figures/AddPartition}
\caption{Add Partition window (Only appear if related packages are installed).}
\label{fig:addPartition}
\end{figure}

If there is any coding overlaps in the partitions, the warning message window (Figure \ref{fig:warning}) will appear. Read and click \textbf{OK} to continue.

\begin{figure}	
\centering	
\includeimage[width=0.9\textwidth]{figures/warning}
\caption{Warning message window (Only appear if there is any coding overlaps in the partitions).}
\label{fig:warning}
\end{figure}

Once loaded, five character partitions are displayed in the main panel (Figure \ref{fig:BEAUTI_DataPartitions}). The alignment is divided into a protein coding part and a non-coding part,and the coding part is divided in codon positions 1, 2 and 3. 
{\bf You must remove the `coding' partition before continuing to the next step as it refers to the same nucleotides as partitions `1stpos', `2ndpos' and `3rdpos'.} To remove the `coding' partition select the row and click the `-' button at the bottom of the table. 
You can view the alignment by double clicking the partition.

\begin{figure}
\centering	
\includeimage[width=0.9\textwidth]{figures/BEAUti_DataPartitions}
\caption{A screenshot of the data tab in BEAUti. This and all following screenshots
were taken on an Apple computer running Mac OS X and will look slightly different on other operating systems.}
\label{fig:BEAUTI_DataPartitions}
\end{figure}

%\begin{figure}[h]
%
%\includeimage[width=\textwidth]{figures/AlignmentViewer}
%
%\caption{A screenshot of the alignment viewer in BEAUti.}
%\label{fig:BEAUTI_AlignmentViewer}
%\end{figure}

\subsection*{Link/Unlink partition models}

\begin{figure}
\centering	
\includeimage[width=0.9\textwidth]{figures/BEAUti_DataPartitions_final}
\caption{A screenshot of the Partitions tab in BEAUti after linking and renaming the clock model and tree.}
\label{fig:BEAUti_DataPartitions_final}
\end{figure}

Since the sequences are linked (i.e. they are all from the mitochondrial genome
which is not believed to undergo recombination in birds and mammals) they
share the same ancestry, so the partitions should share the same time-tree in
the model. For the sake of simplicity, we will also assume the partitions share
the same evolutionary rate for each branch, and hence the same ``clock model''.
We will restrict our modelling of rate heterogeneity to among-site heterogeneity
within each partition, and also allow the partitions to have different mean rates of evolution. 

So, at this point we will need to link the clock model and tree. In the {\bf Partitions} panel, select all four partitions in the table (or none, by default all partitions are affected) and click the \texttt{Link Trees} button and then the \texttt{Link Clock Models} button (see Figure \ref{fig:BEAUti_DataPartitions_final}). Then click on the first drop-down menu in the Clock Model column and rename the shared clock model to `clock'. Likewise rename the shared tree to `tree'. This will make following options and generated log files more easy to read.


\subsection{Setting the substitution model}

The next step is to set up the substitution model. 
%First we will temporarily link the site models in the Partitions panel so that we can change the model of all partitions simultaneously. 
Then, select the {\bf Site Models} tab at the top of the main window (we skip the {\bf Tip Dates} tab since all taxa are from contemporary
samples). This will reveal the evolutionary model settings for BEAST. The options available depend on whether the data are nucleotides, or amino acids, binary data, or general data. The settings that will appear after loading the primate nucleotide alignment will be the default values for nucleotide data so we need to make some changes. 

\begin{figure}
\centering	
\includeimage[width=0.9\textwidth]{figures/BEAUti_Model}
\caption{A screenshot of the site model tab in BEAUti.}
\label{fig:BEAUti_Model}
\end{figure}

Most of the models should be familiar to you. %(see Chapter {\ref{chapter:SubstitutionModels} for details}). 
First, set the \textbf{Gamma Category Count} to 4 and then tick the `estimate' box for the \textbf{Shape} parameter. This will allow rate variation 
between sites in each partition to be modelled.  Note that 4 to 6 categories works sufficiently well for most data
sets, while having more categories takes more time to compute for little added benefit. We leave the Proportion Invariant entry set to zero.

Then select  \textbf{HKY} from the \textbf{Subst Model} drop-down menu. Ideally, a substitution model should be selected that fit the data best for each partition, but here for the sake of simplicity we use HKY for all partitions. Further, select \textbf{Empirical} from the \textbf{Frequencies} drop-down menu. This will fix the frequencies to the proportions observed in the data (for each partition individually, once we unlink the site models). This approach means that we can get a good fit to the data without explicitly estimating these parameters. We do it here simply to make the log files a bit shorter and more readable in later parts of the exercise. 

\begin{figure}
\centering	
\includeimage[width=0.9\textwidth]{figures/cloneFrom}
\caption{clone configuration from one site model to others.}
\label{fig:cloneFrom}
\end{figure}

Finally check the `estimate' box for the \textbf{Substitution rate} parameter and select the \textbf{Fix mean mutation rate} check box. This will allow the individual partitions to have their relative rates estimated for unlinked the site models (Figure \ref{fig:BEAUti_Model}).

%Now, return to the {\bf Partitions} panel and unlink the site models so that each partition has its own named site model with independent substitution model parameters and relative rate.

At last, hold `shift' key to select all site models on the left side, and click \textbf{OK} to clone the setting from \textbf{noncoding} into \textbf{1stpos}, \textbf{2ndpos} and \textbf{3rdpos} (Figure \ref{fig:cloneFrom}). Go through each site model, as you can see, their configurations are same now. 


\subsection{Setting the clock model}

The next step is to select the {\bf Clock Models} tab at the top of the
main window. This is where we select the molecular clock model. For this exercise we are going to leave the selection at the {\it default} value of a \textbf{strict molecular clock}, because this data is very clock-like, and does not need rate variation among branches to be included in the model.

To test for clock-likeness, you can (i) run the analysis with
a relaxed clock model and check how much variation among rates are implied
by the data (see coefficient of variation for more on this), or
(ii) perform a model comparison between a strict and relaxed clock using path
sampling, or (iii) use a random local clock model \cite{drummond2010bayesian}
which explicitly considers whether each branch in the tree needs its own branch
rate.

%\begin{figure}
%\includeimage[width=\textwidth]{figures/BEAUti_Clock}
%\caption{A screenshot of the clock model tab in BEAUti.}
%\label{fig:BEAUti_Clock}
%\end{figure}

\subsection{Priors }

The {\bf Priors} tab allows priors to be specified for each parameter in the
model. The model selections made in the site model and clock model tabs, result in the inclusion of various parameters
in the model, and these are shown in the priors tab (see Figure \ref{fig:BEAUti_Prior1}).

\begin{figure}
\centering
\includeimage[width=\textwidth]{figures/BEAUti_Prior1}
\caption{A screenshot of the Priors tab in BEAUti. }
\label{fig:BEAUti_Prior1}
\end{figure}

Here we also specify that we wish to use the Calibrated Yule model \cite{Heled:2012fk}
as the tree prior. The Yule model is a simple model of
speciation that is generally more appropriate when considering sequences from
different species. %(see Section \ref{section:YuleBirthDeathModels} for details).
Select the \texttt{Calibrated Yule Model} from the {\bf Tree prior} dropdown menu.

%\begin{figure}
%\includeimage[width=\textwidth]{figures/BEAUti_Tree}
%\caption{A screenshot of the tree prior drop down menu in BEAUti.}
%\label{fig:BEAUti_Tree}
%\end{figure}

\subsubsection{Defining the calibration node}

We now need to specify a prior distribution on the calibrated node, based on our
fossil knowledge. This is known as calibrating our tree. To define an extra prior, press the small {\bf +} button below list of priors. If it is not visible in your view, please scroll down the panel to the bottom to find {\bf +} button. 
You will see a dialog that allows you to define a subset of the taxa in the phylogenetic tree. Once you have created a taxa set you will be able to add calibration information for its most recent common
ancestor (MRCA) later on. 

Name the taxa set by filling in the taxon set label entry. 
Call it \texttt{human-chimp}, since it will contain the taxa for {\it Homo sapiens} and {\it Pan}. 
In the list below you will see the available taxa. Select each of the two taxa in turn and press the $> >$ arrow button. (Figure \ref{fig:BEAUti_TaxonSets}).
Click OK and the newly defined taxa set will be added in to the prior list.
As this is a calibrated node to be used in conjunction with the Calibrated Yule prior, monophyly must be enforced, so select the checkbox marked \texttt{Monophyletic}. This will constrain the tree topology so that the human-chimp grouping is kept monophyletic during the course of the MCMC analysis.

\begin{figure}
\centering
\includeimage[width=9cm]{figures/BEAUti_TaxonSets}
\caption{Taxon set editor in BEAUti.}
\label{fig:BEAUti_TaxonSets}
\end{figure}

To encode the calibration information we need to specify a distribution for the MRCA of human-chimp.
Select the \textbf{Log-normal} distribution from the drop down menu to the right of the newly added \texttt{human-chimp.prior}. 
Click on the black triangle and a graph of the probability density function will appear, along with parameters for the log normal distribution.
We are going to set $M=1.78$ and $S=0.085$ which will specify a distribution centred at about 6 million
years with a standard deviation of about 0.5 million years. This will give
a central 95\% probability range covering 5-7 Mya. This roughly corresponds to the current consensus
estimate of the date of the most recent common ancestor of humans and chimpanzees (Figure \ref{fig:BEAUti_Prior2}).

%We now need to specify a prior distribution on the calibrated node, based on our prior fossil knowledge. This is known
%as calibrating our tree. Select the \textbf{Log Normal} distribution from the drop down menu to the right of the newly added \texttt{human-%chimp.prior}. Click on the black triangle to the right and a graph of the probability density function will appear, along with parameters for the log-%normal distribution.
%We are going to specify a normal distribution centered at 6 million years with a standard deviation of 0.5 million years. This will give a central 95\% range of about 5-7 My. 
%This roughly corresponds to the current consensus estimate of the date of the most recent common ancestor of humans and chimpanzees (Figure %\ref{fig:BEAUti_TaxonSets}).

\begin{figure}
\centering	
\includeimage[width=\textwidth]{figures/BEAUti_Prior2}
\caption{A screenshot of the calibration prior options in the Priors panel in BEAUti.}
\label{fig:BEAUti_Prior2}
\end{figure}

We should convince ourselves that the priors shown in the priors panel really reflect the prior information we have about the parameters of the model. Finally we will also specify some diffuse ``uninformative'' but proper priors on the overall molecular clock rate (\texttt{clockRate}) and the speciation rate (\texttt{birthRateY}) of the Yule tree prior. For each of them, select \textbf{Gamma} from the drop-down menu
and using the arrow button, expand the view to reveal the parameters of the
Gamma prior. For both the clock rate and the Yule birth rate set the Alpha
(shape) parameter to 0.001 and the Beta (scale) parameter to 1000 (Figure \ref{fig:GammaPrior}).

By default each of the gamma shape parameters has an exponential prior
distribution with a mean of 1. This implies (see Figure 3.7) we expect some
rate variation. By default the kappa parameters for the HKY model have a log
normal(1,1.25) prior distribution, which broadly agrees with empirical evidence
\cite{rosenberg2003patterns} on the range of realistic values for transition/transversion
bias. These default priors are kept since they are suitable for this particular
analysis.

\begin{figure}
\centering
\includeimage[width=\textwidth]{figures/GammaPrior}
\caption{Gamma prior.}
\label{fig:GammaPrior}
\end{figure}

%Finally, put a log-normal distribution on the kappa parameters. \TODO{with what M and S parameters?}
%The clock model parameters will appear when the clock rate is estimated. 
%The priors table should now look like Figure \ref{fig:BEAUti_Prior2}.

%\begin{figure}
%\includeimage[width=\textwidth]{figures/BEAUti_Prior2}
%\label{fig:BEAUti_Prior2}
%\caption{A screenshot of the newly added calibration prior in the Priors panel in BEAUti. \TODO{This figure needs to be changed to remove the HomiCerco.prior.}}
%\end{figure}

\subsection{Setting the MCMC options}

%Ignore the \textbf{Operators} tab as this just contains technical
%settings affecting the efficiency of the MCMC program (see Notes for details). 

The next tab, {\bf MCMC}, provides more general
settings to control the length of the MCMC run and the file names. 

Firstly we have the \textbf{Chain Length}. This is the number of
steps the MCMC will make in the chain before finishing. How long this
should be depends on the size of the data set, the complexity of the
model and the quality of answer required. The default value of 10,000,000
is entirely arbitrary and should be adjusted according to the size
of your data set. For this data set let's set the \texttt{chain
length} to \underline{\chainLength{}} as this will run reasonably quickly on most modern
computers (a few minutes).

%\begin{figure}
%\includeimage[width=\textwidth]{figures/BEAUti_MCMC}
%\label{fig:BEAUti_MCMC}
%\caption{A screenshot of the MCMC panel in BEAUti.}
%\end{figure}

The \textbf{Store Every} field determines how often the state is stored to file. Storing
the state periodically is useful for situations where the computing environment
is not very reliable and a BEAST run can be interrupted. Having a stored copy
of the recent state allows you to resume the chain instead of restarting from the
beginning, so you do not need to get through burn-in again.
The \textbf{Pre Burnin} field specifies the number of samples that are not logged at the very beginning of
the analysis. We leave the \textbf{Store Every} and \textbf{Pre Burnin} fields set to their default
values. Below these are the details of the log files. Each one can be expanded by
clicking the black triangle.

The next options specify how often the parameter values in the Markov
chain should be displayed on the screen and recorded in the log file.
The screen output is simply for monitoring the programs progress so
can be set to any value (although if set too small, the sheer quantity
of information being displayed on the screen will actually slow the
program down). For the log file, the value should be set relative
to the total length of the chain. Sampling too often will result in
very large files with little extra benefit in terms of the accuracy
of the analysis. Sample too infrequently and the log file will not
record sufficient information about the distributions of the parameters. 
You probably want to aim to store no more than 10,000 samples so this should be
set to no less than chain length / 10,000.

For this exercise we will set the \texttt{trace log} and the \texttt{tree log} frequency to \underline{\logEvery{}}
and the \texttt{screen log} to \underline{\screenEvery{}}.
Also specify \texttt{Primates.log} as the file name of the trace log file
and \texttt{Primates.trees} as the file name of the tree log file.
Make sure that the file name filed of the screen log is left empty,
or the screen log will not be written to the screen.

\begin{itemize}
\item If you are using the Windows operating system then we suggest you add the suffix \texttt{.txt} to both of these (so,
\texttt{Primates.log.txt} and \texttt{Primates.trees.txt}) so that Windows recognizes
these as text files. 
\end{itemize}

\subsection{Generating the BEAST XML file }

We are now ready to create the BEAST XML file. To do this,
select the \textbf{Save} option from the \textbf{File} menu. 
Check the default priors, and save the file with an appropriate name
(we usually end the filename with \texttt{.xml}, i.e., \texttt{Primates.xml}).
We are now ready to run the file through BEAST. 

\section{Running BEAST }

\begin{figure}
\centering	
\includeimage[width=0.7\textwidth]{figures/BEAST}
\caption{A screenshot of BEAST.}
\label{fig:BEAST}
\end{figure}

Now run BEAST and when it asks for an input file, provide your newly
created XML file as input. BEAST will then run until it has finished
reporting information to the screen. The actual results files are
save to the disk in the same location as your input file. The output to the screen will
look something like this: 

{\tiny   
\begin{verbatim}

             BEAST v2.2.0, 2002-2014
       Bayesian Evolutionary Analysis Sampling Trees
                 Designed and developed by
Remco Bouckaert, Alexei J. Drummond, Andrew Rambaut and Marc A. Suchard
                              
               Department of Computer Science
                   University of Auckland
                  remco@cs.auckland.ac.nz
                  alexei@cs.auckland.ac.nz
                              
             Institute of Evolutionary Biology
                  University of Edinburgh
                     a.rambaut@ed.ac.uk
                              
              David Geffen School of Medicine
           University of California, Los Angeles
                     msuchard@ucla.edu
                              
                Downloads, Help & Resources:
                    	http://beast2.org/
                              
Source code distributed under the GNU Lesser General Public License:
             	http://github.com/CompEvol/beast2
                              
                     BEAST developers:
	Alex Alekseyenko, Trevor Bedford, Erik Bloomquist, Joseph Heled, 
	Sebastian Hoehna, Denise Kuehnert, Philippe Lemey, Wai Lok Sibon Li, 
	Gerton Lunter, Sidney Markowitz, Vladimir Minin, Michael Defoin Platel, 
          	Oliver Pybus, Chieh-Hsi Wu, Walter Xie
                              
                         Thanks to:
    	Roald Forsberg, Beth Shapiro and Korbinian Strimmer


Random number seed: 777

12 taxa
898 sites
413 patterns
TreeLikelihood uses beast.evolution.likelihood.BeerLikelihoodCore4
TreeLikelihood uses beast.evolution.likelihood.BeerLikelihoodCore4
TreeLikelihood uses beast.evolution.likelihood.BeerLikelihoodCore4
TreeLikelihood uses beast.evolution.likelihood.BeerLikelihoodCore4
===============================================================================
Citations for this model:

Bouckaert RR, Heled J, Kuehnert D, Vaughan TG, Wu C-H, Xie D, Suchard MA,
  Rambaut A, Drummond AJ (2014) BEAST 2: A software platform for Bayesian
  evolutionary analysis. PLoS Computational Biology 10(4): e1003537

Heled J, Drummond AJ (2012) Calibrated Tree Priors for Relaxed Phylogenetics
  and Divergence Time Estimation. Systematic Biology 61(1):138-149.

Hasegawa M, Kishino H, Yano T (1985) Dating the human-ape splitting by a
  molecular clock of mitochondrial DNA. Journal of Molecular Evolution
  22:160-174.

===============================================================================
Writing file /Primates.log
Writing file /Primates.trees
         Sample      posterior ESS(posterior)     likelihood          prior
              0     -7924.3599              N     -7688.4922      -235.8676 --
          10000     -5529.0700         2.0        -5459.1993       -69.8706 --
          20000     -5516.8159         3.0        -5442.3372       -74.4786 --
          30000     -5516.4959         4.0        -5439.0839       -77.4119 --
          40000     -5521.1160         5.0        -5445.6047       -75.5113 --
          50000     -5520.7350         6.0        -5444.6198       -76.1151 --
          60000     -5512.9427         7.0        -5439.2561       -73.6866 2m39s/Msamples
          70000     -5513.8357         8.0        -5437.9432       -75.8924 2m39s/Msamples
          ...
        5990000     -5516.6832       474.6        -5442.5945       -74.0886 2m40s/Msamples
        6000000     -5512.3802       472.2        -5440.8928       -71.4874 2m40s/Msamples

Operator                                                     Tuning   #accept   #reject     total  prob.acc
ScaleOperator(treeScaler.t:tree)                              0.703     39935    174155    214090     0.187 
ScaleOperator(treeRootScaler.t:tree)                          0.644     37329    177166    214495     0.174 
Uniform(UniformOperator.t:tree)                                        479419   1668915   2148334     0.223 
SubtreeSlide(SubtreeSlide.t:tree)                             9.922    272787    801404   1074191     0.254 
Exchange(narrow.t:tree)                                                   744   1074261   1075005     0.001 
Exchange(wide.t:tree)                                                       9    214594    214603     0.000 
WilsonBalding(WilsonBalding.t:tree)                                         4    214548    214552     0.000 
ScaleOperator(KappaScaler.s:noncoding)                        0.352      1739      5375      7114     0.244 
DeltaExchangeOperator(FixMeanMutationRatesOperator)           0.425     17277    126203    143480     0.120 
ScaleOperator(gammaShapeScaler.s:noncoding)                   0.375      1729      5428      7157     0.242 
ScaleOperator(CalibratedYuleBirthRateScaler.t:tree)           0.245     58005    156128    214133     0.271 
ScaleOperator(StrictClockRateScaler.c:clock)                  0.706     50080    164952    215032     0.233 
UpDownOperator(strictClockUpDownOperator.c:clock)             0.589     50809    163882    214691     0.237 
ScaleOperator(KappaScaler.s:1stpos)                            0.44      1816      5388      7204     0.252 
ScaleOperator(gammaShapeScaler.s:1stpos)                       0.42      1927      5129      7056     0.273 
ScaleOperator(KappaScaler.s:2ndpos)                           0.332      1964      5301      7265     0.270 
ScaleOperator(gammaShapeScaler.s:2ndpos)                      0.303      2033      5177      7210     0.282 
ScaleOperator(KappaScaler.s:3rdpos)                           0.505      1424      5860      7284     0.195 
ScaleOperator(gammaShapeScaler.s:3rdpos)                      0.267      1569      5536      7105     0.221 

Total calculation time: 964.067 seconds
\end{verbatim}}

Note that there is some useful information at the start concerning the alignments and which tree likelihoods are used. Also, all citations relevant for the analysis are mentioned at the start of the run, which can easily be copied to
manuscripts reporting about the analysis. Then follows reporting of the chain,
which gives some real time feedback on progress of the chain.

At the end, an operator analysis is printed, which lists all operators used in the
analysis together with how often the operator was tried, accepted, and rejected
(see columns \#total, \#accept and \#reject respectively). The acceptance rate is
the proportion of times an operator is accepted when it is selected for doing a
proposal. In general, an acceptance rate that is high, say over 0.5 indicates the
proposals are conservative and do not explore the parameter space efficiently. On
the other hand a low acceptance rate indicates that proposals are too aggressive
and almost always result in a state that is rejected because of its low posterior.
Both too high and too low acceptance rates result in low ESS values. 
An acceptance rate of 0.234 is the target (based on very limited evidence provided by \cite{gelman1996efficient}) for many (but not all) operators implemented in BEAST.

Some operators have a tuning parameter, for example the scale factor of a
scale parameter. If the final acceptance rate is not near the target, BEAST will
suggest a new value for the tuning parameter, which is printed in the operator analysis. In this case, all acceptance rates are good for the operators that have tuning parameters. Operators without tuning parameters include the wide
exchange and Wilson-Balding operators for this analysis. Both these operators
attempt to change the topology of the tree with large steps, but since the data
supports a single topology overwhelmingly, these radical proposals are almost
always rejected.

\section{Analyzing the results}

\begin{figure}
\centering	
\includeimage[width=0.9\textwidth]{figures/Tracer1}
\caption{A screenshot of Tracer v{\TracerVersion}.}
\label{fig:Tracer1}
\end{figure}

Run the program called {\bf Tracer} to analyze the output of BEAST. When the main
window has opened, choose {\bf Import Trace File...} from the {\bf File} menu and select the file that
BEAST has created called \texttt{Primates.log} (Figure \ref{fig:Tracer1}).

Remember that MCMC is a stochastic algorithm so the actual numbers will not be exactly the same as those depicted in the figure.

On the left hand side is a list of the different quantities that BEAST has logged to file. 
There are traces for the posterior (this
is the natural logarithm of the product of the tree likelihood and the prior density), and the continuous parameters. Selecting a trace
on the left brings up analyses for this trace on the right hand side depending on tab that is selected. When first opened, the
`posterior' trace is selected and various statistics of this trace are shown under the Estimates tab.
In the top right of the window is a table of calculated statistics for the selected trace. 

Select the \texttt{clockRate} parameter in the lefthand list to look at
the average rate of evolution (averaged over the whole tree and all sites). Tracer will plot a (marginal posterior) histogram for the selected statistic and also give you
summary statistics such as the mean and median. The 95\% HPD stands for {\it highest posterior density interval} and represents the most compact interval on the selected parameter that contains 95\% of the posterior probability. It can be loosely thought of as a Bayesian analog to a confidence interval. The \texttt{TreeHeight} parameter gives the marginal posterior distribution of the age of the root of the entire tree.

Select the \texttt{TreeHeight} parameter and then Ctrl-click \texttt{mrcatime(human-chimp)}  (Command-click on Mac OS X). This will show a display of the age of the root and the calibration MRCA we specified earlier in BEAUti. You can verify that the divergence that we used to calibrate the tree
(\texttt{mrcatime(human-chimp)}) has a posterior distribution that matches the prior distribution we specified (Figure \ref{fig:Tracer_divergences}).

\begin{figure}
\centering	
\includeimage[width=0.9\textwidth]{figures/Tracer_divergences}
\caption{A screenshot of the 95\% HPD intervals of the root height and the user-specified (human-chimp) MRCA in Tracer.}
\label{fig:Tracer_divergences}
\end{figure}

\section{Marginal posterior estimates}

To show the relative rates for the four partitions, select the mutationRate parameter for each of the four partitions, and select the marginal density tab in Tracer.
Figure \ref{fig:Tracer_marginalDensity} shows the marginal densities for the relative substitution rates. The
plot shows that codon positions 1 and 2 have substantially different rates (0.456
versus 0.183) and both are far slower than codon position 3 with a relative rate of
2.941. The noncoding partition has a rate intermediate between codon positions
1 and 2 (0.346). Taken together this result suggests strong purifying selection in
both the coding and noncoding regions of the alignment.

%Likewise, a marginal posterior estimate can be obtained for the gamma shape
%parameter and the kappa parameter, which are shown in Figures 6.10 and 6.11
%respectively. The plot for the gamma shape parameter suggest that there is
%considerable rate variation for all of the partitions with the least rate variation
%in the third codon position.

%The plot for the kappa parameter (Figure 6.11) shows that all partitions show
%considerable transition/transversion bias, but that the third codon position in particular has a high bias with a mean of almost 29 more transitions than transversions.

\begin{figure}
\centering	
\includeimage[width=\textwidth]{figures/Tracer_marginalDensity}
\caption{A screenshot of the marginal posterior densities of the relative substitution rates of the four partitions (relative to the site-weighted mean rate).}
\label{fig:Tracer_marginalDensity}
\end{figure}


%\if0
\begin{figure}
\centering	
\includeimage[width=\textwidth]{figures/primatePriorPosteriorKappa}

%\include{primatePriorPosteriorShape}
\caption{The marginal prior and posterior densities for the shape ($\alpha$) parameters. The prior is in gray. The posterior density estimate for each partition is also shown: noncoding (orange) and first (red), second (green) and third (blue) codon positions.}
\label{fig:primatePriorPosteriorShape}
%\end{figure}

%\begin{figure}
%\centering	
\includeimage[width=\textwidth]{figures/primatePriorPosteriorShape}

%\input{primatePriorPosteriorKappa}
\caption{The marginal prior and posterior densities for the transition/tranversion bias ($\kappa$) parameters. The prior is in gray. The posterior density estimate for each partition is also shown: noncoding (orange) and first (red), second (green) and third (blue) codon positions.}
\label{fig:primatePriorPosteriorKappa}
\end{figure}
%\fi

\newpage
\subsection*{Questions}
\vspace{5 mm}

\textit{What is the estimated rate of molecular evolution for this gene tree (include the 95\% HPD interval)?}

%\vspace{5 mm}
%\framebox(420,30){}
%\vspace{5 mm}

\textit{What sources of error does this estimate include?}

%\vspace{5 mm}
%\framebox(420,30){}
%\vspace{5 mm}

%\item Does the rate of evolution differ substantially amongst different lineages in the tree?

\textit{How old is the root of the tree (give the mean and the 95\% HPD range)?}

%\vspace{5 mm}
%\framebox(420,30){}
%\vspace{5 mm}
  
%The \texttt{coefficientOfVariation} statistic gives a summary of how much the rate of evolution varies from lineage to lineage (expressed as a proportion of the mean rate).
%   
%\bigskip{}
%
%Selecting the \texttt{treeModel.rootHeight} parameter gives the marginal posterior distribution of the age of the root of entire tree (i.e., the divergence between feline papillomavirus and canine oral %papillomavirus).
%
%\bigskip{}
%
%\textit{How old is the root of the tree (give the mean and the HPD range)?}
%
%\vspace{5 mm}
%\framebox(420,30){}
%\vspace{5 mm}
\newpage

\section{Obtaining an estimate of the phylogenetic tree}

BEAST also produces a posterior sample of phylogenetic time-trees along with its sample of parameter estimates. 
These need to be summarized using the program {\bf TreeAnnotator}. This will take the set of trees and find the best
supported one. It will then annotate this representative summary tree with the mean ages of all the
nodes and the corresponding 95\% HPD ranges. It will also calculate the posterior clade probability for each
node. Run the TreeAnnotator program and set it up as depicted in Figure \ref{fig:TreeAnnotator1}.

\begin{figure}
\centering	
\includeimage[width=0.9\textwidth]{figures/TreeAnnotator1}
\caption{A screenshot of TreeAnnotator.}
\label{fig:TreeAnnotator1}
\end{figure}

The burnin is the number of trees to remove from the start of the sample. Unlike {\bf Tracer} which specifies the number of
steps as a burnin, in {\bf TreeAnnotator} you need to specify the actual number of trees. For this run, you specified a chain
length of \chainLength{} steps sampling every \logEvery{} steps. Thus the trees file will contain \tracesNumber{} trees and so to specify a 10\% burnin
in the top text field.

The {\bf Posterior probability limit} option specifies a limit such that if a node is found at less than this frequency in the sample
of trees (i.e., has a posterior probability less than this limit), it will not be annotated. The default of 0.5 means that only nodes
seen in the majority of trees will be annotated. Set this to zero to annotate all nodes.

The {\bf Target tree type} specifies the tree topology that will be annotated. You can either choose a specific tree from a file or ask TreeAnnotator to find a tree in your sample.
The default option, {\bf Maximum clade credibility tree}, finds the tree with the highest product of the posterior probability of
all its nodes.

For node heights, the default is Common Ancestor Heights, which calculates
the height of a node as the mean of the MRCA time of all pairs of nodes in
the clade. For trees with large uncertainty in the topology and thus many clades
with low support, some other methods can result in trees with negative branch
lengths. In this analysis, the support for all clades in the summary tree is very
high, so this is no issue here.
Choose {\bf Mean heights} for node heights. This sets the heights (ages) of each node in the tree to the mean height across the
entire sample of trees for that clade.

For the input file, select the trees file that BEAST created and select a file for the
output (here we called it \mccTree{}). Now press Run and wait for the program to finish.

\section{Visualizing the tree estimate}

Finally, we can visualize the tree in another program called {\bf FigTree}. Run this program, and open
the \mccTree{} file by using the Open command in the File menu. The tree should appear.
You can now try selecting some of the options in the control panel on the left. 
First of all, expend {\bf Trees} option in the panel, and check {\bf Order nodes} and choose \textit{Ordering} by \textit{decreasing}. 
Try selecting {\bf Node Bars} to get node age error bars. Also turn on {\bf Branch Labels} and select {\bf posterior} to get
it to display the posterior probability for each node. 
If you use a non strict clock model then under {\bf Appearance} you can also tell FigTree to colour the branches by the rate.
You should end up with something similar to Figure \ref{fig:FigTree}.

\begin{figure}
\centering	
\includeimage[width=0.9\textwidth]{figures/FigTree}
\includeimage[width=0.9\textwidth]{figures/DensiTree}
\caption{A screenshot of FigTree and DensiTree.}
\label{fig:FigTree}
\end{figure}

An alternative view of the tree can be made with DensiTree, which is part of Beast 2. The advantage
of DensiTree is that it is able to visualize both uncertainty in node heights and uncertainty in topology.
For this particular dataset, the dominant topology is present in more than 99\% of the samples. So, 
we conclude that this analysis results in a very high consensus on topology (Figure \ref{fig:FigTree}).


%\begin{figure}
%\begin{center}
%\includeimage[width=0.7\textwidth]{figures/DensiTree}
%\end{center}
%\caption{A screenshot of DensiTree.}
%\label{fig:DensiTree}
%\end{figure}

\newpage
\subsection*{Questions}
%\vspace{5 mm}

\begin{enumerate}
\item \textit{Does the rate of evolution differ substantially amongst different lineages in the tree?}

%\vspace{5 mm}
%\framebox(420,60){}
%\vspace{5 mm}

\item DensiTree has a clade bar (Menu Window/View clade toolbar) to show information on clades.

\textit{What is the support for the clade [Homo\_sapiens, Pan, Gorilla, Hylobates]?}

%\vspace{5 mm}
%\framebox(420,30){}
%\vspace{5 mm}

\item You can browse through the topologies in DensiTree using the Browse menu.
The most popular topology has a support of over 99\%.

\textit{What is the support for the second most popular topology?}

%\vspace{5 mm}
%\framebox(420,30){}
%\vspace{5 mm}

\item Under the help menu, DensiTree shows some information.

\textit{How many topologies are in the tree set?}

%\vspace{5 mm}
%\framebox(420,30){}
%\vspace{5 mm}

\end{enumerate}

\newpage

\section{Comparing your results to the prior}

It is a good idea to rerun the analysis while sampling from the prior to make sure
that interactions between priors are not affecting your prior information. The
interaction between priors can be problematic especially when using calibrations
since it means putting multiple priors on the tree.

Using BEAUti, set up the same analysis but under the MCMC options, select the {\bf Sample from prior only} option. This will allow you to visualize the full prior distribution in the absence of your sequence data. Summarize the trees from the full prior
distribution and compare the summary to the posterior summary tree.

Divergence time estimation using ``node dating" of the type described in this
chapter has been applied to answer a variety of different questions in ecology
and evolution. For example, node dating with fossils was used in determining
the species diversity of cycads \cite{nagalingum2011recent}, analysing the rate of
evolution in flowering plants \cite{smith2008rates}, and investigating the
origins of hot and cold desert cyanobacteria \cite{bahl2011ancient}.

% \renewcommand{\refname}{Bibliography}% if you prefer this heading
\bibliographystyle{amsplain} 
\bibliography{DivergenceDatingTutorial}








%\newpage


\end{document}


